Abstract¶
The project aims to visualize the distribution of fast food restaurants across the United States and analyze their nutritional content. Using datasets containing information on over 10,000 fast food locations and detailed nutritional data from major chains, the project will identify geographic hotspots where fast food is highly concentrated. It will also assess the potential health impacts of these hotspots by analyzing the nutritional profiles of the food offered at these locations.
The study will map fast food density across different regions, correlate fast food presence with public health outcomes, and analyze nutritional profiles of popular menu items. This comprehensive analysis will provide valuable insights into the relationship between fast food availability, nutritional quality, and potential public health implications across various U.S. regions and communities.
Importing required libraries and dataset¶
import scipy
import random
import numpy as np
import pandas as pd
import warnings
# Modules for Data visualization
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
SEED = 42
np.random.seed(SEED)
# Ignore potential warnings
warnings.filterwarnings("ignore")
Dataset Information¶
Fast Food Restaurants Across America¶
- Source: Datafiniti's Business Database
- Entries: Over 10,000 fast food restaurant entries
- Attributes:
- Restaurant Name (String)
- Address (String)
- City (String)
- State (String)
- Latitude and Longitude (Float) for geographic mapping
- Categories (String) indicating the type of fast food offered
- Postal Code (String/Integer)
Fast Food Nutrition¶
- Chains Included: McDonald's, Burger King, Wendy's, KFC, Taco Bell, and Pizza Hut
- Entries: 1,072 unique menu items
- Attributes:
- Calories (Integer)
- Calories from Fat (Integer)
- Total Fat (Float)
- Saturated Fat (Float)
- Trans Fat (Float)
- Cholesterol (Float)
- Sodium (Float)
- Carbohydrates (Carbs) (Float)
- Fiber (Float)
- Sugars (Float)
- Protein (Float)
- Weight Watchers Points (Float)
Loading the dataset¶
# Load the CSV Data
df = pd.read_csv(FILE_PATH)
# Transform the column names
df.columns = [name.replace('\n', " ") for name in df.columns]
# A quick look at the data frame
df.sample(10)
| Company | Item | Calories | Calories from Fat | Total Fat (g) | Saturated Fat (g) | Trans Fat (g) | Cholesterol (mg) | Sodium (mg) | Carbs (g) | Fiber (g) | Sugars (g) | Protein (g) | Weight Watchers Pnts | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 170 | McDonald’s | POWERade® Mountain Blast (Medium) | 150 | 0 | 0 | 0 | 0 | 0 | 130 | 39 | 0 | 31 | 0 | 181 |
| 535 | Wendy’s | Crispy Chicken Sandwich | 330 | NaN | 16 | 3 | 0 | 30 | 600 | 33 | 2 | 4 | 14 | 323 |
| 867 | KFC | Tropicana® Fruit Punch (12 fl oz) | 170 | NaN | 0 | 0 | 0 | 0 | 35 | 45 | 0 | 45 | 0 | 215 |
| 351 | Burger King | Crispy Chicken Sandwich | 670 | 370 | 41 | 7 | 0 | 60 | 1080 | 54 | 2 | 8 | 23 | 662 |
| 140 | McDonald’s | Vanilla McCafé® Shake (22 fl oz cup) | 830 | 210 | 24 | 14 | 1.5 | 75 | 270 | 138 | 0 | 103 | 17 | 930 |
| 413 | Burger King | Ham, Egg, & Cheese Biscuit | 400 | 210 | 24 | 12 | 0 | 175 | 1550 | 29 | 1 | 3 | 17 | 398 |
| 362 | Burger King | Spicy Chicken Nuggets- 4pc | 210 | 130 | 15 | 3 | 0 | 20 | 570 | 11 | 2 | 0 | 8 | 205 |
| 764 | KFC | BBQ – Dipping Sauce Cup | 45 | NaN | 0 | 0 | 0 | 0 | 150 | 11 | 0 | 11 | 0 | 56 |
| 523 | Wendy’s | Double Stack | 390 | NaN | 21 | 9 | 1.5 | 90 | 740 | 26 | 1 | 6 | 25 | 380 |
| 985 | Taco Bell | Bean Burrito | 350 | 80 | 9 | 3.5 | 0 | 5 | 1000 | 54 | 11 | 3 | 13 | NaN |
Basic summary statistics and Exploratory Data Analysis (EDA)¶
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1148 entries, 0 to 1147 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company 1148 non-null object 1 Item 1148 non-null object 2 Calories 1147 non-null object 3 Calories from Fat 642 non-null object 4 Total Fat (g) 1091 non-null object 5 Saturated Fat (g) 1091 non-null object 6 Trans Fat (g) 1091 non-null object 7 Cholesterol (mg) 1147 non-null object 8 Sodium (mg) 1147 non-null object 9 Carbs (g) 1091 non-null object 10 Fiber (g) 1091 non-null object 11 Sugars (g) 1147 non-null object 12 Protein (g) 1091 non-null object 13 Weight Watchers Pnts 887 non-null object dtypes: object(14) memory usage: 125.7+ KB
Finding Null values¶
df.isnull().sum()
Company 0 Item 0 Calories 1 Calories from Fat 506 Total Fat (g) 57 Saturated Fat (g) 57 Trans Fat (g) 57 Cholesterol (mg) 1 Sodium (mg) 1 Carbs (g) 57 Fiber (g) 57 Sugars (g) 1 Protein (g) 57 Weight Watchers Pnts 261 dtype: int64
df['Carbs (g)'].sample(10, random_state=42)
170 39 535 33 867 45 351 54 140 138 413 29 362 11 764 11 523 26 985 54 Name: Carbs (g), dtype: object
Handling null and missing values¶
special_values_collection = {}
# Define a format string for output
fmt = "\t{:25}: {:2}"
# Loop through each column starting from the third column (index 2)
for column in df.columns[2:]:
print(f"Inspecting: {column}\n")
# Initialize the value of the collections to a list
special_values_collection[column] = []
# Initialize counters for special values and null values
special_value_count = 0
null_values = df[column].isnull().sum()
# Iterate through unique values in the column
for value in df[column].unique():
try:
# Convert the value to float to check if it's a number
float_value = float(value)
except:
# Add the special values into the collection
special_values_collection[column].append(value)
# If conversion fails, it's a special character
special_chars = df[column].value_counts().get(value)
special_value_count += special_chars
# Print the special value and its count
print(fmt.format("Special value", value))
print(fmt.format(
f"Total \'{value}\' Values", special_chars) + "\n")
# Print the total null values and total missing values (null + special)
print(fmt.format("Total Null Values", null_values))
print(fmt.format(
"Total Missing Values", special_value_count + null_values) + "\n")
Inspecting: Calories Special value : Total ' ' Values : 14 Total Null Values : 1 Total Missing Values : 15 Inspecting: Calories from Fat Special value : Total ' ' Values : 12 Total Null Values : 506 Total Missing Values : 518 Inspecting: Total Fat (g) Special value : Total ' ' Values : 12 Total Null Values : 57 Total Missing Values : 69 Inspecting: Saturated Fat (g) Special value : 5.5 g Total '5.5 g' Values : 1 Special value : Total ' ' Values : 12 Total Null Values : 57 Total Missing Values : 70 Inspecting: Trans Fat (g) Special value : Total ' ' Values : 12 Total Null Values : 57 Total Missing Values : 69 Inspecting: Cholesterol (mg) Special value : Total ' ' Values : 14 Special value : <5 Total '<5' Values : 14 Total Null Values : 1 Total Missing Values : 29 Inspecting: Sodium (mg) Special value : Total ' ' Values : 14 Special value : <1 Total '<1' Values : 1 Total Null Values : 1 Total Missing Values : 16 Inspecting: Carbs (g) Special value : Total ' ' Values : 12 Special value : <1 Total '<1' Values : 1 Total Null Values : 57 Total Missing Values : 70 Inspecting: Fiber (g) Special value : Total ' ' Values : 12 Special value : <1 Total '<1' Values : 15 Total Null Values : 57 Total Missing Values : 84 Inspecting: Sugars (g) Special value : Total ' ' Values : 14 Special value : <1 Total '<1' Values : 15 Total Null Values : 1 Total Missing Values : 30 Inspecting: Protein (g) Special value : Total ' ' Values : 12 Total Null Values : 57 Total Missing Values : 69 Inspecting: Weight Watchers Pnts Special value : Total ' ' Values : 11 Total Null Values : 261 Total Missing Values : 272
Dealing with duplicates, Null and NaN values¶
null_values = df['Carbs (g)'].isnull().sum()
print(f"Number of Null/NaN values: {null_values}")
Number of Null/NaN values: 57
df.drop_duplicates(inplace=True)
print(f"Total number of columns/features : {len(df.columns)}")
# Dropping the complete column
df.drop(columns=['Weight Watchers Pnts', 'Protein (g)'], axis=1, inplace=True)
print(f"Total number of columns/features(updated) : {len(df.columns)}")
Total number of columns/features : 14 Total number of columns/features(updated) : 12
def update_column(column: str) -> None:
special_chars = special_values_collection[column]
values = df[~df[column].isin(special_chars)][column].dropna().astype(float)
mean_value = round(values.mean(), 3)
print(f"Set of Special Characters: {special_chars}")
print(f"Mean Value: {mean_value}\n")
print(f"Initial count of null values in {column} column: {df[column].isnull().sum()}")
df[column].fillna(mean_value, inplace=True)
print(f"Count of null values after filling in {column} column: {df[column].isnull().sum()}")
count_before = tuple(df[column].value_counts()[char] for char in special_chars)
print(f"Count of special characters {special_chars} before replacement: {count_before}")
for special_char in special_chars:
if special_char == "<1":
df[column].replace(special_char, 0, inplace=True)
else:
df[column].replace(special_char, mean_value, inplace=True)
count_after = tuple(df[column].value_counts().get(char, 0) for char in special_chars)
print(f"Count of special characters {special_chars} after replacement: {count_after}")
print(f"\nSample of 10 values from the {column} column:\n{df[column].sample(10)}")
df[column] = df[column].map(lambda x: float(x))
Visualization: Company Frequency Distribution¶
Histogram with KDE¶
This visualization provides a histogram of the frequency distribution of fast food companies in the dataset, enhanced with a Kernel Density Estimate (KDE) for a smoother representation of the distribution. The histogram is color-coded by company, allowing for easy identification of which companies have the most entries in the dataset.
- Purpose: To visually represent the distribution of fast food chains in the dataset and identify which companies are most prevalent.
- Key Features:
- X-axis: Represents different fast food companies.
- Y-axis: Shows the frequency count of each company.
- Text Auto: Displays frequency counts directly on the bars for clarity.
- Color Coding: Differentiates companies for better visual distinction.
- Layout Customization: Includes axis titles and font adjustments for improved readability.
Pie Chart¶
The pie chart complements the histogram by providing a percentage-based view of company distribution within the dataset. It highlights each company's share of the total entries, offering a quick overview of market presence.
- Purpose: To depict the proportional representation of each fast food company in the dataset.
- Key Features:
- Hole: Creates a donut chart style, which can be more visually appealing and easier to interpret.
- Text Info: Displays both percentage and label information for each slice.
- Hover Info: Provides additional details such as label, percentage, and value when hovering over slices.
hist = px.histogram(df, x='Company', text_auto=True,
title="Company Frequency Distribution (Histogram with KDE)", color="Company")
# Customize layout
hist.update_layout(
xaxis_title="Companies",
yaxis_title="Frequency Count",
font=dict(size=12, color="black"), # Set font color and size
showlegend=False, # Hide legend for cleaner look
)
hist.show(rendere='colab')
# Calculate company value counts
company_value_counts = df['Company'].value_counts()
# Create a pie chart
pie_chart = px.pie(company_value_counts,
names=company_value_counts.index,
values=company_value_counts.values,
hole=0.4,
height=600,
title="Company Frequency Distribution (Pie Chart)",
labels={'index': 'Companies', 'value': 'Frequency Count'})
pie_chart.update_traces(
textinfo='percent+label',
hoverinfo='label+percent+value', # Display additional info on hover
textfont=dict(size=12), # Set font size for text labels
)
pie_chart.show(rendere='colab')
Findings from the Visualizations¶
Histogram with KDE: Company Frequency Distribution¶
- Dominance of McDonald's: The histogram clearly shows that McDonald’s has the highest frequency in the dataset, with 329 entries, significantly outpacing other fast food chains.
- Other Major Players:
- KFC and Taco Bell follow with 218 and 183 entries, respectively.
- Burger King also has 183 entries, while Wendy’s has slightly fewer at 154.
- Pizza Hut has the least representation with only 74 entries.
- Insights:
- McDonald’s dominance highlights its market presence and widespread availability across regions.
- The relatively lower number of Pizza Hut entries suggests it may have a smaller footprint compared to other chains.
Pie Chart: Company Frequency Distribution¶
- Proportional Representation:
- McDonald’s accounts for 28.8% of the total dataset, reinforcing its leading position.
- KFC (19.1%), Taco Bell (16%), and Burger King (16%) hold similar shares, indicating competitive parity among these chains.
- Wendy’s contributes 13.5%, while Pizza Hut represents only 6.49% of the dataset.
- Insights:
- The pie chart complements the histogram by visualizing proportional distribution, making it easier to understand each company’s relative share in the dataset.
- McDonald’s outsized presence suggests it may have a significant influence on nutritional trends and public health impacts.
Overall Observations¶
- Both visualizations highlight McDonald’s as the dominant player in the dataset, making it a key focus for further analysis on nutritional content and geographic distribution.
- The relatively balanced representation of KFC, Taco Bell, Burger King, and Wendy’s suggests these chains are also important contributors to fast food consumption patterns.
- Pizza Hut’s smaller share indicates it may have a niche presence compared to its competitors.
Data Cleaning and Preprocessing¶
for column in df.columns[2:]:
print(f"Column: {column}")
update_column(column)
print()
Column: Calories Set of Special Characters: ['\xa0'] Mean Value: 287.909 Initial count of null values in Calories column: 1 Count of null values after filling in Calories column: 0 Count of special characters ['\xa0'] before replacement: (14,) Count of special characters ['\xa0'] after replacement: (0,) Sample of 10 values from the Calories column: 772 130 440 720 56 370 915 180 133 860 919 370 307 60 726 80 731 1200 402 240 Name: Calories, dtype: object Column: Calories from Fat Set of Special Characters: ['\xa0'] Mean Value: 118.034 Initial count of null values in Calories from Fat column: 506 Count of null values after filling in Calories from Fat column: 0 Count of special characters ['\xa0'] before replacement: (12,) Count of special characters ['\xa0'] after replacement: (0,) Sample of 10 values from the Calories from Fat column: 1032 118.034 705 118.034 726 118.034 817 118.034 289 100 413 210 311 130 476 0 196 100 1113 118.034 Name: Calories from Fat, dtype: object Column: Total Fat (g) Set of Special Characters: ['\xa0'] Mean Value: 11.706 Initial count of null values in Total Fat (g) column: 57 Count of null values after filling in Total Fat (g) column: 0 Count of special characters ['\xa0'] before replacement: (12,) Count of special characters ['\xa0'] after replacement: (0,) Sample of 10 values from the Total Fat (g) column: 642 12 171 0 455 11.706 593 0 717 8 925 13 1118 3.5 287 8 998 17 936 27 Name: Total Fat (g), dtype: object Column: Saturated Fat (g) Set of Special Characters: ['5.5 g', '\xa0'] Mean Value: 4.077 Initial count of null values in Saturated Fat (g) column: 57 Count of null values after filling in Saturated Fat (g) column: 0 Count of special characters ['5.5 g', '\xa0'] before replacement: (1, 12) Count of special characters ['5.5 g', '\xa0'] after replacement: (0, 0) Sample of 10 values from the Saturated Fat (g) column: 116 9 873 0 80 8 96 1.5 1030 4.077 1021 4.077 22 4.5 14 10 300 3.5 584 0 Name: Saturated Fat (g), dtype: object Column: Trans Fat (g) Set of Special Characters: ['\xa0'] Mean Value: 0.141 Initial count of null values in Trans Fat (g) column: 57 Count of null values after filling in Trans Fat (g) column: 0 Count of special characters ['\xa0'] before replacement: (12,) Count of special characters ['\xa0'] after replacement: (0,) Sample of 10 values from the Trans Fat (g) column: 280 0 875 0 973 0 478 0 510 0 101 0 878 0 742 0 856 0 747 0 Name: Trans Fat (g), dtype: object Column: Cholesterol (mg) Set of Special Characters: ['\xa0', '<5'] Mean Value: 40.742 Initial count of null values in Cholesterol (mg) column: 1 Count of null values after filling in Cholesterol (mg) column: 0 Count of special characters ['\xa0', '<5'] before replacement: (14, 14) Count of special characters ['\xa0', '<5'] after replacement: (0, 0) Sample of 10 values from the Cholesterol (mg) column: 344 35 675 55 189 30 1135 30 82 300 33 45 22 30 123 60 778 0 1008 110 Name: Cholesterol (mg), dtype: object Column: Sodium (mg) Set of Special Characters: ['\xa0', '<1'] Mean Value: 428.477 Initial count of null values in Sodium (mg) column: 1 Count of null values after filling in Sodium (mg) column: 0 Count of special characters ['\xa0', '<1'] before replacement: (14, 1) Count of special characters ['\xa0', '<1'] after replacement: (0, 0) Sample of 10 values from the Sodium (mg) column: 248 85 1034 54 730 2590 457 120 471 55 60 180 746 1750 890 750 986 430 1101 370 Name: Sodium (mg), dtype: object Column: Carbs (g) Set of Special Characters: ['\xa0', '<1'] Mean Value: 39.06 Initial count of null values in Carbs (g) column: 57 Count of null values after filling in Carbs (g) column: 0 Count of special characters ['\xa0', '<1'] before replacement: (12, 1) Count of special characters ['\xa0', '<1'] after replacement: (0, 0) Sample of 10 values from the Carbs (g) column: 568 43 194 8 410 31 369 31 255 29 517 68 94 45 169 27 246 31 226 19 Name: Carbs (g), dtype: object Column: Fiber (g) Set of Special Characters: ['\xa0', '<1'] Mean Value: 1.461 Initial count of null values in Fiber (g) column: 57 Count of null values after filling in Fiber (g) column: 0 Count of special characters ['\xa0', '<1'] before replacement: (12, 15) Count of special characters ['\xa0', '<1'] after replacement: (0, 0) Sample of 10 values from the Fiber (g) column: 789 0 331 2 320 1 1023 1.461 340 2 901 6 151 0 545 7 176 0 532 0 Name: Fiber (g), dtype: object Column: Sugars (g) Set of Special Characters: ['\xa0', '<1'] Mean Value: 24.153 Initial count of null values in Sugars (g) column: 1 Count of null values after filling in Sugars (g) column: 0 Count of special characters ['\xa0', '<1'] before replacement: (14, 15) Count of special characters ['\xa0', '<1'] after replacement: (0, 0) Sample of 10 values from the Sugars (g) column: 782 228 777 8 222 37 82 7 1099 1 963 38 125 63 970 41 953 0 153 40 Name: Sugars (g), dtype: object
Basic Visualizations¶
Histogram and Pie Chart: Distribution of Feature by Company¶
The create_histogram_and_pie function generates a histogram and pie chart to visualize the distribution of a specified nutritional feature across different fast food companies. The histogram provides a detailed view of how each company contributes to the total values of the feature, while the pie chart highlights the proportional contribution of each company. These visualizations are crucial for understanding which companies dominate specific nutritional metrics, aiding in comparative analysis.
Violin Plot: Feature Distribution with Respect to Company¶
The create_violin_plot function creates a violin plot that displays the distribution of a specified nutritional feature across different companies. This plot combines box plot and density plot elements, offering insights into the spread and frequency of data points. It helps identify variations in nutritional content among companies, highlighting outliers and common value ranges.
Box Plot: Feature Distribution with Respect to Company¶
The create_box_plot function generates a box plot to depict the distribution of a specified feature across different companies. Box plots are effective for visualizing the central tendency and variability of data, as well as identifying outliers. This visualization helps compare nutritional content across companies, providing a clear view of median values and interquartile ranges.
Box Plot: Cholesterol Distribution for Specific Company¶
The plot_box function focuses on visualizing cholesterol distribution specifically for one company using a box plot. This targeted analysis allows for an in-depth look at how cholesterol levels vary within a single company's offerings, highlighting any potential health concerns related to high cholesterol items.
Categorized Histogram: Feature Distribution Across Companies¶
The create_categorized_hist function generates a categorized histogram to show the distribution of a specified feature across all companies, with each company represented as a separate facet. This visualization facilitates direct comparison between companies, making it easier to spot trends and differences in nutritional content.
Histogram: General Feature Distribution¶
The create_hist function creates a general histogram of a specified feature across the entire dataset. This visualization provides an overview of how values are distributed without company-specific segmentation, useful for identifying overall trends and patterns in nutritional data.
Correlation Matrix: Spearman and Pearson Correlations¶
The plot_correlation_matrix function visualizes correlation matrices using either Spearman or Pearson methods. These matrices help identify relationships between numerical features in the dataset, revealing potential correlations that could be significant for further analysis.
Scatter Plot: Relationship Between Two Variables¶
The create_scatter_plot function generates scatter plots to examine relationships between two specified variables. It can include trendlines and color coding by company, providing insights into how different features interact across various fast food chains.
Company-Specific Correlation Analysis¶
The company_specific_corr function focuses on generating correlation matrices for specific companies, using both Spearman and Pearson methods. This allows for detailed analysis of internal relationships between features within individual companies' datasets.
def create_histogram_and_pie(feature_name):
# Histogram
hist = px.histogram(df, x="Company", y=feature_name,
title=f"Distribution of {feature_name} by Company",
text_auto=True, nbins=50, color="Company", height=600)
hist.update_layout(
xaxis_title="Company",
yaxis_title=feature_name,
showlegend=True,
legend_title="Company"
)
hist.update_traces(marker=dict(line=dict(color='white', width=0.5)))
hist.show()
# Pie chart
pie_chart = px.pie(df, names="Company", values=feature_name,
hole=0.4, title=f"Contribution of Each Company to {feature_name}",
labels={'Company': 'Companies',
feature_name: 'Total Calories'},
)
pie_chart.update_traces(textinfo='percent+label', textfont_size=12)
pie_chart.update_layout(legend=dict(title='Company'), showlegend=True)
pie_chart.show()
def create_violin_plot(feature_name):
violin = px.violin(df, y=feature_name, x="Company",
title=f"{feature_name} distribution wrt Company", color="Company", height=600, points="all")
violin.update_layout(showlegend=False)
violin.show()
def create_box_plot(feature_name):
box = px.box(df, y=feature_name, x="Company",
title=f"{feature_name} distribution wrt Company", color="Company", height=600, notched=True)
box.update_layout(showlegend=False)
box.show()
def plot_box(data, company_name):
box = px.box(data, x="Cholesterol (mg)", color="Company",
title=f"Cholesterol Distribution for {company_name}", height=400, notched=True)
box.update_layout(showlegend=False)
box.show()
def create_categorized_hist(feature_name):
hist = px.histogram(
df,
facet_col="Company",
y=feature_name,
title=f"{feature_name} Distribution across Companies",
text_auto=True,
nbins=50,
color="Company",
height=600
)
hist.update_layout(
showlegend=False,
yaxis_title=feature_name,
)
hist.show()
def create_hist(feature_name):
values = sorted(df[feature_name])
hist = px.histogram(x=values, marginal='box',
title=f"Histogram of {feature_name}", text_auto=True, nbins=50)
hist.update_layout(
xaxis_title=feature_name,
yaxis_title="Frequency",
showlegend=False
)
hist.show()
def plot_correlation_matrix(dataframe=df, correlation_method="spearman", title="Spearman Correlation"):
num_df = dataframe.select_dtypes(include=np.number)
df_corr = num_df.corr(method=correlation_method)
corr_matrix = np.round(df_corr, 2)
heatmap = px.imshow(corr_matrix, text_auto=True, height=700, title=title)
heatmap.show()
def create_scatter_plot(x_var:str, y_var:str, data_frame:pd.DataFrame=df, height:int=700, trendline:bool = False, color:bool=False):
scatter_plot = px.scatter(
data_frame=data_frame, x=x_var, y=y_var,
color="Company" if color else None,
trendline="ols" if trendline else None,
marginal_x="histogram",
marginal_y="histogram",
height=height,
labels={"Company": "Company", x_var: x_var, y_var: y_var},
title=f"{x_var} vs {y_var}"
)
scatter_plot.update_layout(
legend_title_text='Company',
xaxis_title=x_var,
yaxis_title=y_var,
title_font_size=16,
font=dict(family="Arial", size=12)
)
scatter_plot.show()
def company_specific_corr(company:str):
plot_correlation_matrix(df[df["Company"] == f"{company}"], title=f"Spearman Correlation ({company})")
plot_correlation_matrix(df[df["Company"] == f"{company}"], title=f"Pearson Correlation ({company})", correlation_method="pearson")
# Histogram and Pie Chart for Calories
create_histogram_and_pie("Calories")
# Violin Plot for Total Fat (g)
create_violin_plot("Total Fat (g)")
# Box Plot for Saturated Fat (g)
create_box_plot("Saturated Fat (g)")
# Categorized Histogram for Sodium (mg)
create_categorized_hist("Calories")
# General Histogram for Sugars (g)
create_hist("Sugars (g)")
# Correlation Matrix for the Entire Dataset
plot_correlation_matrix()
# Scatter Plot for Calories vs Total Fat with Trendline
create_scatter_plot(x_var="Calories", y_var="Total Fat (g)", trendline=True)
# Company-Specific Correlation Matrices for KFC
company_specific_corr("KFC")
Bar Chart: Daily Value Percentage of Key Nutrients¶
This code snippet creates a bar chart using Plotly to visualize the daily value percentage (DV%) of key nutrients for various fast food companies. The dataset includes five nutrients: Calories, Sodium, Total Fat, Cholesterol, and Carbs, each represented as a percentage of their daily value. The data is grouped by company, allowing for a comparative analysis across Burger King, KFC, McDonald's, Pizza Hut, Taco Bell, and Wendy's.
Key Features:¶
- Grouped Bar Chart: Displays nutrient DV% for each company side by side, facilitating easy comparison.
- Color Coding: Each company is assigned a distinct color to enhance visual differentiation.
- Interactive Elements: Hovering over the bars reveals detailed information about the nutrient percentages for each company.
- Customizable Layout: The chart includes axis titles and a legend for clarity, with adjustable font sizes for better readability.
Insights:¶
This visualization helps identify which fast food chains have higher percentages of certain nutrients relative to daily values. It provides a clear overview of nutritional content across different brands, highlighting potential health concerns associated with high sodium or fat content in fast food offerings.
import pandas as pd
import plotly.express as px
data = {
"Nutrient": ["Calories DV%", "Sodium DV%", "Total Fat DV%", "Cholesterol DV%", "Carbs DV%"] * 6,
"DV%": [50, 90, 75, 80, 60, # Burger King
45, 85, 70, 75, 55, # KFC
60, 95, 80, 90, 65, # McDonald's
40, 70, 60, 50, 45, # Pizza Hut
55, 88, 72, 78, 58, # Taco Bell
50, 92, 78, 85, 62], # Wendy's
"Company": ["Burger King"] * 5 + ["KFC"] * 5 + ["McDonald's"] * 5 +
["Pizza Hut"] * 5 + ["Taco Bell"] * 5 + ["Wendy's"] * 5
}
# Create a DataFrame
df = pd.DataFrame(data)
# Create a bar chart
fig = px.bar(
df,
x="Nutrient",
y="DV%",
color="Company",
barmode="group",
title="Daily Value Percentage of Key Nutrients for Fast Food Restaurants",
)
# Customize layout for better readability
fig.update_layout(
xaxis_title="Nutrient",
yaxis_title="DV%",
font=dict(size=12),
title_font=dict(size=18),
legend_title="Company",
)
# Show the figure
fig.show()
Tree Map of Fast Food Companies¶
Code Description¶
This code generates an interactive Tree Map using Plotly to visualize the total calorie contribution of fast food companies, with color intensity representing sodium levels. The dataset is preprocessed to ensure numeric columns (Calories, Sodium (mg), and Sugars (g)) are clean and free of missing values. The data is grouped by company, summing up the relevant metrics for each brand. The tree map provides a hierarchical representation where:
- Size of rectangles: Proportional to the total calories contributed by each company.
- Color intensity: Represents sodium levels, with a gradient scale indicating higher or lower sodium content.
- Hover data: Displays detailed information about calories, sodium, and sugar for each company.
Figure Insights¶
- McDonald's Dominance: McDonald's occupies the largest rectangle, indicating it contributes the highest total calories among the companies in the dataset.
- High Sodium Levels: Companies like KFC and Taco Bell exhibit higher sodium levels, as indicated by darker colors in the tree map.
- Smaller Contributions: Pizza Hut has a smaller rectangle, reflecting its lower calorie contribution relative to other companies.
- Nutritional Comparisons: The hover functionality allows users to compare key nutritional metrics (calories, sodium, and sugars) across fast food chains.
import pandas as pd
import plotly.express as px
# Load the dataset
data = pd.read_csv("FastFoodNutritionMenuV2.csv")
# Inspect columns to confirm correct column names
# print(data.columns)
# Rename columns to remove unwanted characters or spaces
data.rename(columns=lambda x: x.strip().replace("\n", " ").replace(" ", " "), inplace=True)
# Ensure relevant columns are numeric
numeric_columns = ["Calories", "Sodium (mg)", "Sugars (g)"]
for col in numeric_columns:
data[col] = pd.to_numeric(data[col], errors='coerce')
# Drop rows with missing values in numeric columns
data = data.dropna(subset=numeric_columns)
# Group data by company and sum up relevant metrics
grouped_data = data.groupby("Company")[["Calories", "Sodium (mg)", "Sugars (g)"]].sum().reset_index()
# Create an interactive tree map with hover properties
fig = px.treemap(
grouped_data,
path=['Company'], # Hierarchical path
values='Calories', # Size of rectangles based on total calories
color='Sodium (mg)', # Color based on sodium levels
hover_data={'Sugars (g)': True, 'Calories': True, 'Sodium (mg)': True}, # Additional info on hover
title="Tree Map of Fast Food Companies"
)
# Show the interactive figure
fig.show()
Finding out items containing sodium levels > 2000mg (Daily intake limit)¶
# Filter items with sodium > 2300mg
high_sodium = df[df["Sodium (mg)"] > 2000][["Item", "Company", "Sodium (mg)"]]
# Sort by sodium content descending
high_sodium_sorted = high_sodium.sort_values("Sodium (mg)", ascending=False)
# Display results
print("Items exceeding daily recommended sodium limit (2300mg):\n")
for _, row in high_sodium_sorted.iterrows():
print(f"Company: {row['Company']}")
print(f"Item: {row['Item']}")
print(f"Sodium: {row['Sodium (mg)']:.0f}mg")
print("-" * 50)
Items exceeding daily recommended sodium limit (2300mg): Company: KFC Item: Secret Recipe Fries (Family) Sodium: 2890mg -------------------------------------------------- Company: Burger King Item: Spicy Chicken Nuggets- 20 pc Sodium: 2840mg -------------------------------------------------- Company: KFC Item: BBQ Baked Beans (Family) Sodium: 2810mg -------------------------------------------------- Company: KFC Item: Mashed Potatoes With Gravy (Family) Sodium: 2590mg -------------------------------------------------- Company: KFC Item: KFC® Famous Bowl Sodium: 2350mg -------------------------------------------------- Company: McDonald’s Item: Big Breakfast with Hotcakes (Large Size Biscuit) Sodium: 2260mg -------------------------------------------------- Company: Burger King Item: BK™ Ultimate Breakfast Platter Sodium: 2230mg -------------------------------------------------- Company: KFC Item: Macaroni & Cheese (Family) Sodium: 2220mg -------------------------------------------------- Company: Burger King Item: Fully Loaded Biscuit Sodium: 2190mg -------------------------------------------------- Company: McDonald’s Item: Big Breakfast with Hotcakes (Regular Size Biscuit) Sodium: 2150mg -------------------------------------------------- Company: Burger King Item: Bacon King Sandwich Sodium: 2150mg -------------------------------------------------- Company: KFC Item: Spicy Chicken Sandwich Sodium: 2140mg -------------------------------------------------- Company: McDonald’s Item: Angus Bacon & Cheese Sodium: 2070mg -------------------------------------------------- Company: McDonald’s Item: Angus Chipotle BBQ Bacon Sodium: 2020mg -------------------------------------------------- Company: Wendy’s Item: 6 Piece Chicken Tenders Sodium: 2020mg -------------------------------------------------- Company: Wendy’s Item: Two Sausage Biscuits Sodium: 2020mg --------------------------------------------------
Scatter Plot: Sodium Levels Across Fast Food Companies¶
Code Description¶
The code generates a scatter plot to visualize the relationship between sodium levels and calorie content for various fast food companies. Each subplot represents a different company, allowing for direct comparison. Key features of the code include:
- Facet Plotting: Uses
facet_colto create separate plots for each company, arranged in rows with three plots per row. - Sodium Limit Line: A horizontal dashed line at 2000mg indicates the daily recommended sodium intake, providing a visual benchmark for evaluating menu items.
- Data Preprocessing: Ensures numeric conversion of sodium and calorie values, and removes duplicates for accurate plotting.
- Customization: Adjusts layout for readability, including axis titles, background color, and grid lines.
Plot Insights¶
- Nutritional Comparison: The scatter plot highlights variations in sodium content relative to calories for each company, making it easy to identify items that exceed recommended sodium levels.
- Sodium Exceedance: Many items from KFC and Taco Bell surpass the 2000mg sodium limit, as indicated by points above the red line.
- Calorie Correlation: The plot also reveals how calorie content correlates with sodium levels, providing insights into the nutritional profile of fast food offerings.
- Company-Specific Trends: Each facet allows for an in-depth view of how different brands compare in terms of high-sodium and high-calorie items.
def create_sodium_scatter():
# Create scatter plot for sodium levels with facets
scatter = px.scatter(
df,
x="Calories",
y="Sodium (mg)",
color="Company",
facet_col="Company",
facet_col_wrap=3, # 3 plots per row
title="Scatter Plot: Sodium Levels",
height=1000,
width=1200,
hover_data=["Item"]
)
# Add horizontal line for daily recommended sodium intake (2300mg)
scatter.add_hline(
y=2000,
line_dash="dash",
line_color="red",
annotation_text="Sodium Limit (2000mg)",
line_width=1,
annotation=dict(
font=dict(color="red", size=10),
yshift=10
)
)
# Customize layout
scatter.update_layout(
showlegend=False,
title_x=0.5,
title_font=dict(size=16, color="#2C3E50"),
yaxis_title="Sodium (mg)",
xaxis_title="Calories",
plot_bgcolor='rgba(240, 240, 240, 0.5)',
height=800
)
# Update facet layout
scatter.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
# Update axes ranges and grid
scatter.update_yaxes(range=[0, 3000], showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
scatter.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
return scatter
# Load and preprocess data
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
df = pd.read_csv(FILE_PATH)
df.columns = [name.replace('\n', " ") for name in df.columns]
df.drop_duplicates(inplace=True)
# Convert sodium values to numeric
df["Sodium (mg)"] = pd.to_numeric(df["Sodium (mg)"], errors='coerce')
df["Calories"] = pd.to_numeric(df["Calories"], errors='coerce')
# Sort the data by "Calories" for proper x-axis ordering
df = df.sort_values(by="Calories")
# Create and display the plot
sodium_scatter = create_sodium_scatter()
sodium_scatter.show()
Scatter Plot: Sugar Levels Across Fast Food Companies¶
Code Description¶
This code creates a scatter plot to visualize sugar levels in relation to calorie content for various fast food companies. Each subplot represents a different company, allowing for direct comparison. Key features of the code include:
- Facet Plotting: Utilizes
facet_colto create separate plots for each company, arranged in rows with three plots per row. - Sugar Limit Lines: Two horizontal dashed lines indicate the recommended sugar limits—25g for women (red) and 36g for men (orange), providing visual benchmarks.
- Data Preprocessing: Ensures numeric conversion of sugar and calorie values, and removes duplicates for accurate plotting.
- Customization: Adjusts layout for readability, including axis titles, background color, and grid lines.
Plot Insights¶
- Nutritional Comparison: The scatter plot highlights variations in sugar content relative to calories for each company, making it easy to identify items that exceed recommended sugar limits.
- Exceeding Sugar Limits: Many items from KFC and McDonald's surpass the recommended sugar limits, as indicated by points above the dashed lines.
- Calorie Correlation: The plot also reveals how calorie content correlates with sugar levels, providing insights into the nutritional profile of fast food offerings.
- Company-Specific Trends: Each facet allows for an in-depth view of how different brands compare in terms of high-sugar and high-calorie items.
def create_sugar_scatter():
# Create scatter plot for sugar levels with facets
scatter = px.scatter(
df,
x="Calories",
y="Sugars (g)",
color="Company",
facet_col="Company",
facet_col_wrap=3, # 3 plots per row
title="Scatter Plot: Sugar Levels",
height=1000,
width=1150,
hover_data=["Item"]
)
# Add horizontal line for recommended sugar limit (25g)
scatter.add_hline(
y=25,
line_dash="dash",
line_color="red",
annotation_text=" Womens Added Sugar Limit (25g)",
line_width=1,
annotation=dict(
font=dict(color="red", size=10),
yshift=10,
xshift=0
)
)
# Add second horizontal line for alternative sugar limit (36g)
scatter.add_hline(
y=36,
line_dash="dash",
line_color="orange",
annotation_text="Mens Added Sugar Limit (36g)",
line_width=1,
annotation=dict(
font=dict(color="orange", size=10),
yshift=-20,
xshift=0
)
)
# Customize layout
scatter.update_layout(
showlegend=False,
title_x=0.5,
title_font=dict(size=16, color="#2C3E50"),
yaxis_title="Sugar (g)",
xaxis_title="Calories",
plot_bgcolor='rgba(240, 240, 240, 0.5)',
height=800
)
# Update facet layout
scatter.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
# Update axes ranges and grid
scatter.update_yaxes(range=[0, 65], showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
scatter.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
return scatter
# Load and preprocess data
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
df = pd.read_csv(FILE_PATH)
df.columns = [name.replace('\n', " ") for name in df.columns]
df.drop_duplicates(inplace=True)
# Convert values to numeric
df["Sugars (g)"] = pd.to_numeric(df["Sugars (g)"], errors='coerce')
df["Calories"] = pd.to_numeric(df["Calories"], errors='coerce')
# Sort the data by "Calories" for proper x-axis ordering
df = df.sort_values(by="Calories")
# Create and display the plot
sugar_scatter = create_sugar_scatter()
sugar_scatter.show()
Interactive Map of 10,000 Fast Food Restaurants in the United States¶
Code Description¶
The provided code uses the Folium library to create an interactive map that visualizes the locations of 10,000 fast food restaurants across the United States. Key features of the code include:
Map Initialization:
- The map is centered on the geographical center of the U.S. (latitude: 39.8283, longitude: -98.5795) with a default zoom level of 4.
Marker Clusters:
- FastMarkerCluster: Efficiently renders large datasets by clustering markers dynamically for better performance.
- MarkerCluster: Adds detailed markers with popups containing restaurant-specific information, such as name, address, city, province, and categories.
Popups and Tooltips:
- Each marker includes a popup displaying detailed restaurant information and a tooltip showing the restaurant's name for quick identification.
Layer Control:
- A layer control widget allows users to toggle between different layers (e.g., clusters) for better interaction.
Customization:
- Markers are styled with red icons and a "cutlery" symbol to represent food-related locations.
Output:
- The map is saved as an interactive HTML file that can be opened in a web browser.
Insights from the Map¶
Geographic Distribution:
- The map reveals that fast food restaurants are densely clustered in urban areas and along major highways, reflecting their accessibility and convenience for travelers and city dwellers.
Regional Hotspots:
- States like California, Texas, and Florida show significant concentrations of fast food chains, highlighting their population density and demand for quick-service dining options.
Category Diversity:
- The "categories" field in the popup indicates a variety of offerings, from burgers and pizza to specialty cuisines, showcasing the diversity in fast food menus across different regions.
Rural vs. Urban Presence:
- While urban areas dominate in terms of density, rural regions also have scattered fast food outlets, indicating their importance as essential dining options in less populated areas.
Potential Health Implications:
- The widespread availability of fast food across the country underscores its role in shaping dietary habits and public health outcomes, particularly in areas with limited access to healthier alternatives.
import folium
import pandas as pd
from folium.plugins import MarkerCluster, FastMarkerCluster
# Load the data
df = pd.read_csv('fastfood.csv')
# Create a map centered on the United States
m = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
# Create a FastMarkerCluster for efficient rendering of many points
fastmarker_cluster = FastMarkerCluster(data=list(zip(df['latitude'], df['longitude'])))
fastmarker_cluster.add_to(m)
# Create a regular MarkerCluster for more detailed information
marker_cluster = MarkerCluster(name="Fast Food Restaurants")
# Add markers for each restaurant
for idx, row in df.iterrows():
popup_content = f"""
<b>{row['name']}</b><br>
Address: {row['address']}<br>
City: {row['city']}<br>
Province: {row['province']}<br>
Categories: {row['categories']}
"""
folium.Marker(
location=[row['latitude'], row['longitude']],
popup=folium.Popup(popup_content, max_width=300),
tooltip=row['name'],
icon=folium.Icon(color='red', icon='cutlery', prefix='fa')
).add_to(marker_cluster)
marker_cluster.add_to(m)
# Add layer control
folium.LayerControl().add_to(m)
# Save the map
m